Improved identification of noun phrases in clinical radiology reports using a high-performance statistical natural language parser augmented with the UMLS specialist lexicon.

نویسندگان

  • Yang Huang
  • Henry J Lowe
  • Dan Klein
  • Russell J Cucina
چکیده

OBJECTIVE The aim of this study was to develop and evaluate a method of extracting noun phrases with full phrase structures from a set of clinical radiology reports using natural language processing (NLP) and to investigate the effects of using the UMLS(R) Specialist Lexicon to improve noun phrase identification within clinical radiology documents. DESIGN The noun phrase identification (NPI) module is composed of a sentence boundary detector, a statistical natural language parser trained on a nonmedical domain, and a noun phrase (NP) tagger. The NPI module processed a set of 100 XML-represented clinical radiology reports in Health Level 7 (HL7)(R) Clinical Document Architecture (CDA)-compatible format. Computed output was compared with manual markups made by four physicians and one author for maximal (longest) NP and those made by one author for base (simple) NP, respectively. An extended lexicon of biomedical terms was created from the UMLS Specialist Lexicon and used to improve NPI performance. RESULTS The test set was 50 randomly selected reports. The sentence boundary detector achieved 99.0% precision and 98.6% recall. The overall maximal NPI precision and recall were 78.9% and 81.5% before using the UMLS Specialist Lexicon and 82.1% and 84.6% after. The overall base NPI precision and recall were 88.2% and 86.8% before using the UMLS Specialist Lexicon and 93.1% and 92.6% after, reducing false-positives by 31.1% and false-negatives by 34.3%. CONCLUSION The sentence boundary detector performs excellently. After the adaptation using the UMLS Specialist Lexicon, the statistical parser's NPI performance on radiology reports increased to levels comparable to the parser's native performance in its newswire training domain and to that reported by other researchers in the general nonmedical domain.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using a Statistical Natural Language Parser Augmented with the UMLS Specialist Lexicon to Assign SNOMED CT Codes to Anatomic Sites and Pathologic Diagnoses in Full Text Pathology Reports

To address the problem of extracting structured information from pathology reports for research purposes in the STRIDE Clinical Data Warehouse, we adapted the ChartIndex Medical Language Processing system to automatically identify and map anatomic and diagnostic noun phrases found in full-text pathology reports to SNOMED CT concept descriptors. An evaluation of the system's performance showed a...

متن کامل

Studying impressive parameters on the performance of Persian probabilistic context free grammar parser

In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...

متن کامل

Indexing Anatomical Phrases in Neuro-Radiology Reports to the UMLS 2005AA

This work describes a methodology to index anatomical phrases to the 2005AA release of the Unified Medical Language System (UMLS). A phrase chunking tool based on Natural Language Processing (NLP) was developed to identify semantically coherent phrases within medical reports. Using this phrase chunker, a set of 2,551 unique anatomical phrases was extracted from brain radiology reports. These ph...

متن کامل

A semantic lexicon for medical language processing.

OBJECTIVE Construction of a resource that provides semantic information about words and phrases to facilitate the computer processing of medical narrative. DESIGN Lexemes (words and word phrases) in the Specialist Lexicon were matched against strings in the 1997 Metathesaurus of the Unified Medical Language System (UMLS) developed by the National Library of Medicine. This yielded a "semantic ...

متن کامل

Clustering UMLS Semantic Relations Between Medical Concepts

We propose and implement an innovative semi-supervised framework for automatically discovering UMLS semantic relations. Our proposed framework uses semantic, syntactic and ortho-graphic features both at global level and local level. We experimented with multiple distance metric for clustering including Euclidean distance, spherical k-means distance, and Kullback-Leibler divergence. We show that...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of the American Medical Informatics Association : JAMIA

دوره 12 3  شماره 

صفحات  -

تاریخ انتشار 2005